Breast Cancer Prediction Wisconsin Data Set

This is a side project of analyzing of the Breast Cancer Wisconsin (Diagnostic) DataSet, obtained from Kaggle. I are going to try two different machine learning classification models to compare the results. I've divided my presentation into two sections.

Data preprocessing

Applying machine learning models

1. Data preprocessing

Let’s start by exploring the data.

In [2]:
import pandas as pd
# read the file
df = pd.read_csv("/Users/jiezhao/Downloads/data.csv")
# print the columns name of dataset
print(df.columns)
Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean',
       'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se',
       'compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se',
       'fractal_dimension_se', 'radius_worst', 'texture_worst',
       'perimeter_worst', 'area_worst', 'smoothness_worst',
       'compactness_worst', 'concavity_worst', 'concave points_worst',
       'symmetry_worst', 'fractal_dimension_worst'],
      dtype='object')
In [3]:
# check the data set
df
Out[3]:
id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean ... radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
0 842302 M 17.990 10.38 122.80 1001.0 0.11840 0.27760 0.300100 0.147100 ... 25.380 17.33 184.60 2019.0 0.16220 0.66560 0.71190 0.26540 0.4601 0.11890
1 842517 M 20.570 17.77 132.90 1326.0 0.08474 0.07864 0.086900 0.070170 ... 24.990 23.41 158.80 1956.0 0.12380 0.18660 0.24160 0.18600 0.2750 0.08902
2 84300903 M 19.690 21.25 130.00 1203.0 0.10960 0.15990 0.197400 0.127900 ... 23.570 25.53 152.50 1709.0 0.14440 0.42450 0.45040 0.24300 0.3613 0.08758
3 84348301 M 11.420 20.38 77.58 386.1 0.14250 0.28390 0.241400 0.105200 ... 14.910 26.50 98.87 567.7 0.20980 0.86630 0.68690 0.25750 0.6638 0.17300
4 84358402 M 20.290 14.34 135.10 1297.0 0.10030 0.13280 0.198000 0.104300 ... 22.540 16.67 152.20 1575.0 0.13740 0.20500 0.40000 0.16250 0.2364 0.07678
5 843786 M 12.450 15.70 82.57 477.1 0.12780 0.17000 0.157800 0.080890 ... 15.470 23.75 103.40 741.6 0.17910 0.52490 0.53550 0.17410 0.3985 0.12440
6 844359 M 18.250 19.98 119.60 1040.0 0.09463 0.10900 0.112700 0.074000 ... 22.880 27.66 153.20 1606.0 0.14420 0.25760 0.37840 0.19320 0.3063 0.08368
7 84458202 M 13.710 20.83 90.20 577.9 0.11890 0.16450 0.093660 0.059850 ... 17.060 28.14 110.60 897.0 0.16540 0.36820 0.26780 0.15560 0.3196 0.11510
8 844981 M 13.000 21.82 87.50 519.8 0.12730 0.19320 0.185900 0.093530 ... 15.490 30.73 106.20 739.3 0.17030 0.54010 0.53900 0.20600 0.4378 0.10720
9 84501001 M 12.460 24.04 83.97 475.9 0.11860 0.23960 0.227300 0.085430 ... 15.090 40.68 97.65 711.4 0.18530 1.05800 1.10500 0.22100 0.4366 0.20750
10 845636 M 16.020 23.24 102.70 797.8 0.08206 0.06669 0.032990 0.033230 ... 19.190 33.88 123.80 1150.0 0.11810 0.15510 0.14590 0.09975 0.2948 0.08452
11 84610002 M 15.780 17.89 103.60 781.0 0.09710 0.12920 0.099540 0.066060 ... 20.420 27.28 136.50 1299.0 0.13960 0.56090 0.39650 0.18100 0.3792 0.10480
12 846226 M 19.170 24.80 132.40 1123.0 0.09740 0.24580 0.206500 0.111800 ... 20.960 29.94 151.70 1332.0 0.10370 0.39030 0.36390 0.17670 0.3176 0.10230
13 846381 M 15.850 23.95 103.70 782.7 0.08401 0.10020 0.099380 0.053640 ... 16.840 27.66 112.00 876.5 0.11310 0.19240 0.23220 0.11190 0.2809 0.06287
14 84667401 M 13.730 22.61 93.60 578.3 0.11310 0.22930 0.212800 0.080250 ... 15.030 32.01 108.80 697.7 0.16510 0.77250 0.69430 0.22080 0.3596 0.14310
15 84799002 M 14.540 27.54 96.73 658.8 0.11390 0.15950 0.163900 0.073640 ... 17.460 37.13 124.10 943.2 0.16780 0.65770 0.70260 0.17120 0.4218 0.13410
16 848406 M 14.680 20.13 94.74 684.5 0.09867 0.07200 0.073950 0.052590 ... 19.070 30.88 123.40 1138.0 0.14640 0.18710 0.29140 0.16090 0.3029 0.08216
17 84862001 M 16.130 20.68 108.10 798.8 0.11700 0.20220 0.172200 0.102800 ... 20.960 31.48 136.80 1315.0 0.17890 0.42330 0.47840 0.20730 0.3706 0.11420
18 849014 M 19.810 22.15 130.00 1260.0 0.09831 0.10270 0.147900 0.094980 ... 27.320 30.88 186.80 2398.0 0.15120 0.31500 0.53720 0.23880 0.2768 0.07615
19 8510426 B 13.540 14.36 87.46 566.3 0.09779 0.08129 0.066640 0.047810 ... 15.110 19.26 99.70 711.2 0.14400 0.17730 0.23900 0.12880 0.2977 0.07259
20 8510653 B 13.080 15.71 85.63 520.0 0.10750 0.12700 0.045680 0.031100 ... 14.500 20.49 96.09 630.5 0.13120 0.27760 0.18900 0.07283 0.3184 0.08183
21 8510824 B 9.504 12.44 60.34 273.9 0.10240 0.06492 0.029560 0.020760 ... 10.230 15.66 65.13 314.9 0.13240 0.11480 0.08867 0.06227 0.2450 0.07773
22 8511133 M 15.340 14.26 102.50 704.4 0.10730 0.21350 0.207700 0.097560 ... 18.070 19.08 125.10 980.9 0.13900 0.59540 0.63050 0.23930 0.4667 0.09946
23 851509 M 21.160 23.04 137.20 1404.0 0.09428 0.10220 0.109700 0.086320 ... 29.170 35.59 188.00 2615.0 0.14010 0.26000 0.31550 0.20090 0.2822 0.07526
24 852552 M 16.650 21.38 110.00 904.6 0.11210 0.14570 0.152500 0.091700 ... 26.460 31.56 177.00 2215.0 0.18050 0.35780 0.46950 0.20950 0.3613 0.09564
25 852631 M 17.140 16.40 116.00 912.7 0.11860 0.22760 0.222900 0.140100 ... 22.250 21.40 152.40 1461.0 0.15450 0.39490 0.38530 0.25500 0.4066 0.10590
26 852763 M 14.580 21.53 97.41 644.8 0.10540 0.18680 0.142500 0.087830 ... 17.620 33.21 122.40 896.9 0.15250 0.66430 0.55390 0.27010 0.4264 0.12750
27 852781 M 18.610 20.25 122.10 1094.0 0.09440 0.10660 0.149000 0.077310 ... 21.310 27.26 139.90 1403.0 0.13380 0.21170 0.34460 0.14900 0.2341 0.07421
28 852973 M 15.300 25.27 102.40 732.4 0.10820 0.16970 0.168300 0.087510 ... 20.270 36.71 149.30 1269.0 0.16410 0.61100 0.63350 0.20240 0.4027 0.09876
29 853201 M 17.570 15.05 115.00 955.1 0.09847 0.11570 0.098750 0.079530 ... 20.010 19.52 134.90 1227.0 0.12550 0.28120 0.24890 0.14560 0.2756 0.07919
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
539 921362 B 7.691 25.44 48.34 170.4 0.08668 0.11990 0.092520 0.013640 ... 8.678 31.89 54.49 223.6 0.15960 0.30640 0.33930 0.05000 0.2790 0.10660
540 921385 B 11.540 14.44 74.65 402.9 0.09984 0.11200 0.067370 0.025940 ... 12.260 19.68 78.78 457.8 0.13450 0.21180 0.17970 0.06918 0.2329 0.08134
541 921386 B 14.470 24.99 95.81 656.4 0.08837 0.12300 0.100900 0.038900 ... 16.220 31.73 113.50 808.9 0.13400 0.42020 0.40400 0.12050 0.3187 0.10230
542 921644 B 14.740 25.42 94.70 668.6 0.08275 0.07214 0.041050 0.030270 ... 16.510 32.29 107.40 826.4 0.10600 0.13760 0.16110 0.10950 0.2722 0.06956
543 922296 B 13.210 28.06 84.88 538.4 0.08671 0.06877 0.029870 0.032750 ... 14.370 37.17 92.48 629.6 0.10720 0.13810 0.10620 0.07958 0.2473 0.06443
544 922297 B 13.870 20.70 89.77 584.8 0.09578 0.10180 0.036880 0.023690 ... 15.050 24.75 99.17 688.6 0.12640 0.20370 0.13770 0.06845 0.2249 0.08492
545 922576 B 13.620 23.23 87.19 573.2 0.09246 0.06747 0.029740 0.024430 ... 15.350 29.09 97.58 729.8 0.12160 0.15170 0.10490 0.07174 0.2642 0.06953
546 922577 B 10.320 16.35 65.31 324.9 0.09434 0.04994 0.010120 0.005495 ... 11.250 21.77 71.12 384.9 0.12850 0.08842 0.04384 0.02381 0.2681 0.07399
547 922840 B 10.260 16.58 65.85 320.8 0.08877 0.08066 0.043580 0.024380 ... 10.830 22.04 71.08 357.4 0.14610 0.22460 0.17830 0.08333 0.2691 0.09479
548 923169 B 9.683 19.34 61.05 285.7 0.08491 0.05030 0.023370 0.009615 ... 10.930 25.59 69.10 364.2 0.11990 0.09546 0.09350 0.03846 0.2552 0.07920
549 923465 B 10.820 24.21 68.89 361.6 0.08192 0.06602 0.015480 0.008160 ... 13.030 31.45 83.90 505.6 0.12040 0.16330 0.06194 0.03264 0.3059 0.07626
550 923748 B 10.860 21.48 68.51 360.5 0.07431 0.04227 0.000000 0.000000 ... 11.660 24.77 74.08 412.3 0.10010 0.07348 0.00000 0.00000 0.2458 0.06592
551 923780 B 11.130 22.44 71.49 378.4 0.09566 0.08194 0.048240 0.022570 ... 12.020 28.26 77.80 436.6 0.10870 0.17820 0.15640 0.06413 0.3169 0.08032
552 924084 B 12.770 29.43 81.35 507.9 0.08276 0.04234 0.019970 0.014990 ... 13.870 36.00 88.10 594.7 0.12340 0.10640 0.08653 0.06498 0.2407 0.06484
553 924342 B 9.333 21.94 59.01 264.0 0.09240 0.05605 0.039960 0.012820 ... 9.845 25.05 62.86 295.8 0.11030 0.08298 0.07993 0.02564 0.2435 0.07393
554 924632 B 12.880 28.92 82.50 514.3 0.08123 0.05824 0.061950 0.023430 ... 13.890 35.74 88.84 595.7 0.12270 0.16200 0.24390 0.06493 0.2372 0.07242
555 924934 B 10.290 27.61 65.67 321.4 0.09030 0.07658 0.059990 0.027380 ... 10.840 34.91 69.57 357.6 0.13840 0.17100 0.20000 0.09127 0.2226 0.08283
556 924964 B 10.160 19.59 64.73 311.7 0.10030 0.07504 0.005025 0.011160 ... 10.650 22.88 67.88 347.3 0.12650 0.12000 0.01005 0.02232 0.2262 0.06742
557 925236 B 9.423 27.88 59.26 271.3 0.08123 0.04971 0.000000 0.000000 ... 10.490 34.24 66.50 330.6 0.10730 0.07158 0.00000 0.00000 0.2475 0.06969
558 925277 B 14.590 22.68 96.39 657.1 0.08473 0.13300 0.102900 0.037360 ... 15.480 27.27 105.90 733.5 0.10260 0.31710 0.36620 0.11050 0.2258 0.08004
559 925291 B 11.510 23.93 74.52 403.5 0.09261 0.10210 0.111200 0.041050 ... 12.480 37.16 82.28 474.2 0.12980 0.25170 0.36300 0.09653 0.2112 0.08732
560 925292 B 14.050 27.15 91.38 600.4 0.09929 0.11260 0.044620 0.043040 ... 15.300 33.17 100.20 706.7 0.12410 0.22640 0.13260 0.10480 0.2250 0.08321
561 925311 B 11.200 29.37 70.67 386.0 0.07449 0.03558 0.000000 0.000000 ... 11.920 38.30 75.19 439.6 0.09267 0.05494 0.00000 0.00000 0.1566 0.05905
562 925622 M 15.220 30.62 103.40 716.9 0.10480 0.20870 0.255000 0.094290 ... 17.520 42.79 128.70 915.0 0.14170 0.79170 1.17000 0.23560 0.4089 0.14090
563 926125 M 20.920 25.09 143.00 1347.0 0.10990 0.22360 0.317400 0.147400 ... 24.290 29.41 179.10 1819.0 0.14070 0.41860 0.65990 0.25420 0.2929 0.09873
564 926424 M 21.560 22.39 142.00 1479.0 0.11100 0.11590 0.243900 0.138900 ... 25.450 26.40 166.10 2027.0 0.14100 0.21130 0.41070 0.22160 0.2060 0.07115
565 926682 M 20.130 28.25 131.20 1261.0 0.09780 0.10340 0.144000 0.097910 ... 23.690 38.25 155.00 1731.0 0.11660 0.19220 0.32150 0.16280 0.2572 0.06637
566 926954 M 16.600 28.08 108.30 858.1 0.08455 0.10230 0.092510 0.053020 ... 18.980 34.12 126.70 1124.0 0.11390 0.30940 0.34030 0.14180 0.2218 0.07820
567 927241 M 20.600 29.33 140.10 1265.0 0.11780 0.27700 0.351400 0.152000 ... 25.740 39.42 184.60 1821.0 0.16500 0.86810 0.93870 0.26500 0.4087 0.12400
568 92751 B 7.760 24.54 47.92 181.0 0.05263 0.04362 0.000000 0.000000 ... 9.456 30.37 59.16 268.6 0.08996 0.06444 0.00000 0.00000 0.2871 0.07039

569 rows × 32 columns

In [3]:
# find whether there exist missing value in any columns
df.isnull().sum()
Out[3]:
id                           0
diagnosis                    0
radius_mean                  0
texture_mean                 0
perimeter_mean               0
area_mean                    0
smoothness_mean              0
compactness_mean             0
concavity_mean               0
concave points_mean          0
symmetry_mean                0
fractal_dimension_mean       0
radius_se                    0
texture_se                   0
perimeter_se                 0
area_se                      0
smoothness_se                0
compactness_se               0
concavity_se                 0
concave points_se            0
symmetry_se                  0
fractal_dimension_se         0
radius_worst                 0
texture_worst                0
perimeter_worst              0
area_worst                   0
smoothness_worst             0
compactness_worst            0
concavity_worst              0
concave points_worst         0
symmetry_worst               0
fractal_dimension_worst      0
Unnamed: 32                569
dtype: int64
In [4]:
# clearly the last columns are almost all missing value, we are going to drop it
# df.dropna()     #drop all rows that have any NaN values
# df.dropna(how='all')     #Drop rows where all cells in that row is NA
#df.drop('id',axis=1,inplace=True)
#df.drop('Unnamed: 32',axis=1,inplace=True)
df1=df.copy();
df1=df1.iloc[:,1:32];
df1['diagnosis'] = df['diagnosis'].map({'M':1,'B':0})
#df1.columns = list(df.iloc[:,1:32].columns)
In [5]:
df1.head()
Out[5]:
diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean symmetry_mean ... radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
0 1 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 ... 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
1 1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 ... 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
2 1 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 ... 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
3 1 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 ... 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
4 1 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 ... 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678

5 rows × 31 columns

From above we can see, except the "diagnosis", all other features are numerical value(float) range huge differently. To effeciently modeling the data, I am going to rescale the data.

In [6]:
from sklearn import preprocessing

df2=pd.DataFrame(preprocessing.scale(df1.iloc[:,0:31]));
df2.columns = list(df1.iloc[:,0:31].columns)
df2['diagnosis'] = df1['diagnosis']

df2.head()
Out[6]:
diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean symmetry_mean ... radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
0 1 1.097064 -2.073335 1.269934 0.984375 1.568466 3.283515 2.652874 2.532475 2.217515 ... 1.886690 -1.359293 2.303601 2.001237 1.307686 2.616665 2.109526 2.296076 2.750622 1.937015
1 1 1.829821 -0.353632 1.685955 1.908708 -0.826962 -0.487072 -0.023846 0.548144 0.001392 ... 1.805927 -0.369203 1.535126 1.890489 -0.375612 -0.430444 -0.146749 1.087084 -0.243890 0.281190
2 1 1.579888 0.456187 1.566503 1.558884 0.942210 1.052926 1.363478 2.037231 0.939685 ... 1.511870 -0.023974 1.347475 1.456285 0.527407 1.082932 0.854974 1.955000 1.152255 0.201391
3 1 -0.768909 0.253732 -0.592687 -0.764464 3.283553 3.402909 1.915897 1.451707 2.867383 ... -0.281464 0.133984 -0.249939 -0.550021 3.394275 3.893397 1.989588 2.175786 6.046041 4.935010
4 1 1.750297 -1.151816 1.776573 1.826229 0.280372 0.539340 1.371011 1.428493 -0.009560 ... 1.298575 -1.466770 1.338539 1.220724 0.220556 -0.313395 0.613179 0.729259 -0.868353 -0.397100

5 rows × 31 columns

Now lets check the correlation between features so that we can remove multicolinearity if it exists.

In [7]:
import seaborn as sns
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(30,30)) 
sns.set(font_scale=1.5)
sns.heatmap(df2.iloc[:,1:31].corr(),cbar=True,fmt =' .2f', annot=True)

#xticklabels= df2, yticklabels= df2,
     
sns.plt.show()
In [8]:
sns.pairplot(df2.iloc[:,1:31])
sns.plt.show()

Clearly, there exists strong colinearity between some features. For example, the highest correlations are between:

  1. perimeter_mean,area_mean and radius_mean.
  2. perimeter_worst,area_worst and radius_worst;
  3. perimeter_se,area_se and radius_se.

This multicolinearity could cause machine learning models fail. To reduce it, we use PCA.

In [9]:
from sklearn.decomposition import PCA
import numpy as np
# determine how many components are needed to describe the data
pca = PCA().fit(df2.iloc[:,0:31])
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');
plt.show()
In [12]:
pca=PCA(n_components=0.9)
pcanew=pca.fit_transform(df2.iloc[:,0:31])
print (pcanew.shape)
print(pca.explained_variance_ratio_) 
print (pca.explained_variance_ratio_.sum())

pca=PCA(n_components=0.95)
pcanew=pca.fit_transform(df2.iloc[:,1:31])
print (pcanew.shape)
print(pca.explained_variance_ratio_) 
print (pca.explained_variance_ratio_.sum())
(569, 7)
[ 0.444103    0.18851848  0.0934184   0.06563924  0.05460987  0.03993509
  0.02238485]
0.908608940042
(569, 10)
[ 0.44272026  0.18971182  0.09393163  0.06602135  0.05495768  0.04024522
  0.02250734  0.01588724  0.01389649  0.01168978]
0.951568814337

Therefore, to explain over 90% of the variance, we need to include the first 7 components; to explain over 95% of the variance, we need the first 11 components.

2. Applying machine learning models

a. Logistic Regression

In [42]:
from sklearn.linear_model import LogisticRegression
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix


y= df2.iloc[:,0:1].as_matrix();
print(pcanew.shape)
print(y.shape)
dfnew=pd.DataFrame(np.hstack((pcanew,y)));
X=pcanew;
print(X.shape)
X_train, X_test, y_train, y_test = train_test_split(X, np.ravel(y)); 

# train scikit learn model 
clf = LogisticRegression()
clf.fit(X_train,y_train)
#print ('score Scikit learn: ', clf.score(X_test,y_test))
# prediction
y_pred =clf.predict(X_test)

#computing and plotting confusion matrix

c_m = confusion_matrix(y_test,y_pred)
print('Logistic Regression:\nconfusion matrix\n', c_m,'\n\n')
ax=plt.matshow(c_m,cmap=plt.cm.Blues)
print('Confusion matrix plot of Logistic regression')
plt.colorbar(ax)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
# classification report

print('\n Classification report \n',classification_report(y_test, y_pred))
(569, 10)
(569, 1)
(569, 10)
score Scikit learn:  0.965034965035
Logistic Regression:
confusion matrix
 [[89  1]
 [ 4 49]] 


Confusion matrix plot of Logistic regression
 Classification report 
              precision    recall  f1-score   support

          0       0.96      0.99      0.97        90
          1       0.98      0.92      0.95        53

avg / total       0.97      0.97      0.96       143

b. Support vector machines(SVM)

In [46]:
# Support Vector Classificatio(SVC)
# fitting the SVC model on the training data and predicting for test data
#Radial Basis Function is a commonly used kernel
# gamma is a parameter of the RBF kernel and can be thought of as the 'spread' of the kernel and 
#therefore the decision region. When gamma is low, the 'curve' of the decision boundary is 
#very low and thus the decision region is very broad. 
#When gamma is high, the 'curve' of the decision boundary is high, 
#which creates islands of decision-boundaries around data points. We will see this very clearly below.
#C is a parameter of the SVC learner and is the penalty for misclassifying a data point. 
#When C is small, the classifier is okay with misclassified data points (high bias, low variance). 
#When C is large, the classifier is heavily penalized for misclassified data and therefore bends over 
#backwards avoid any misclassified data points (low bias, high variance).
from sklearn.svm import SVC
svc=SVC(C=100,gamma=0.001,kernel='rbf',probability=True) 
svc.fit(X_train, y_train)
y_pred_svc =svc.predict(X_test)
# computing and plotting confusion matrix
c_m = confusion_matrix(y_test, y_pred_svc)
print('SVC:\n confusion matrix\n', c_m,'\n\n')
ax=plt.matshow(c_m,cmap=plt.cm.Blues)
print('Confusion matrix plot of SVC')
plt.colorbar(ax)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
# classification report
print('\n Classification report \n',classification_report(y_test, y_pred_svc))
print ('#############################################################################')
SVC:
 confusion matrix
 [[89  1]
 [ 4 49]] 


Confusion matrix plot of SVC
 Classification report 
              precision    recall  f1-score   support

          0       0.96      0.99      0.97        90
          1       0.98      0.92      0.95        53

avg / total       0.97      0.97      0.96       143

#############################################################################

Conclusion: I do the feature analysis and show that there are some features with strong colinearity. The observations were confirmed by the PCA analysis,for example, concave.ponts_worst,concavity_worst, concavity_mean, perimeter_worst, area_worst, radius_worst,perimeter_mean, area_mean, radius_mean. After remove these nulticolinerity, we were able to predict with high accuracy the malignant and benign tumors using different models. As results show, both SVC and logistic regression are performing equally good.